[Metrics] move prompt_tokens_total report to main process by liyonghua0910 · Pull Request #7982 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-06-02T12:10:19Z

Motivation

Move prompt token related metrics reporting from the API-side EngineClient to the engine main process, so prompt_tokens_total is reported from the process that owns the main metrics collector.

Modifications

Move prompt_tokens_total, request_prompt_tokens, and request_params_max_tokens reporting from fastdeploy/entrypoints/engine_client.py to fastdeploy/engine/common_engine.py.
Remove the unused main_process_metrics import from fastdeploy/entrypoints/engine_client.py.

Usage or Command

pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py

Accuracy Tests

N/A. This PR only changes metrics reporting location and does not change model output logic.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

codecov-commenter · 2026-06-02T12:46:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@529ec9e). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7982   +/-   ##
==========================================
  Coverage           ?   67.73%           
==========================================
  Files              ?      468           
  Lines              ?    65989           
  Branches           ?    10186           
==========================================
  Hits               ?    44700           
  Misses             ?    18441           
  Partials           ?     2848

Flag	Coverage Δ
GPU	`77.86% <100.00%> (?)`
XPU	`7.02% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-03T09:30:32Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-03 17:29:03 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 2276319
Merge base: 529ec9e (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

Required 任务当前有 2 个失败、0 个运行中、0 个等待中，暂不建议合入。主测试失败为 PR 代码问题；XPU 8 卡失败表现为 Decode 节点健康检查超时，更像环境/服务启动问题。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	36	4	0	1	0

2 任务状态汇总

日志列说明：失败任务直接使用 log_links_markdown 字段（已预生成），运行中任务手动拼接 [Job]({html_url})

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h26m	PR问题：新增 max_tokens 指标未判空	为 sampling_params 判空再上报	Job	-
❌	`xpu_8cards_case_test / run_xpu_8cards_cases`	18m38s	环境问题：XPU Decode 健康检查超时	环境问题，请 rerun	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 28/31 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	23s	Job	-
❌	`Trigger Jenkins for PR`	7m46s	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 28 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 测试失败
置信度: 高
根因摘要: 新增 max_tokens 指标未判空
分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试	错误	根因
`tests/pooling/test_Qwen3-Embedding_serving.py::test_single_text_embedding`	AttributeError/HTTP 500	embedding 请求的 `sampling_params` 为 `None`
`tests/pooling/test_Ernie4_5_reward_serving.py::test_reward_model_with_caching`	HTTP 500	reward/pooling 请求不携带生成采样参数
`tests/engine/test_common_engine.py::test_insert_zmq_task_normal_request_with_worker_pid`	AssertionError	新 metrics 调用在后续调度/映射逻辑前抛错

根因详情:
PR 将 request_params_max_tokens 等指标从 engine_client.py 移到 common_engine.py。但 Request.from_dict 在存在 pooling_params 时会令 request.sampling_params = None，新增的 request.sampling_params.max_tokens 访问直接触发 AttributeError，导致 embedding/reward 请求返回 500；单测中的 metrics mock 也未覆盖新增指标，异常发生后 trace、pause、worker_pid 等后续逻辑未执行。

关键日志:

File "/workspace/FastDeploy/fastdeploy/engine/common_engine.py", line 1344
  main_process_metrics.obs_value("request_params_max_tokens", request.sampling_params.max_tokens)
AttributeError: 'NoneType' object has no attribute 'max_tokens'

修复建议:

fastdeploy/engine/common_engine.py:1344: 上报 request_params_max_tokens 前先判断 request.sampling_params is not None；pooling/reward/embedding 请求跳过该指标或使用请求字典中的原始 max_tokens 默认值。
tests/engine/test_common_engine.py: 补齐 DummyMetrics 对 prompt_tokens_total、request_prompt_tokens、request_params_max_tokens 的 mock，并增加 sampling_params is None 覆盖。

修复建议摘要: 为 sampling_params 判空再上报

关联变更: fastdeploy/engine/common_engine.py:1339-1344，fastdeploy/entrypoints/engine_client.py:371-376
链接: 查看日志

xpu_8cards_case_test / run_xpu_8cards_cases — 超时（置信度: 中）

xpu_8cards_case_test / run_xpu_8cards_cases

状态: ❌ 失败
错误类型: 超时
置信度: 中
根因摘要: XPU Decode 健康检查超时
分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试	错误	根因
`tests/xpu_ci/8cards_cases/test_pd_21b_ep4tp1.py::test_pd_separation`	Failed: PD分离服务启动失败	Decode 节点 600 秒内未健康

根因详情:
该 job 的失败发生在 PD EP4TP1 服务启动阶段，健康检查中 P 节点持续 200，但 D 节点从 0 秒到 591 秒一直为 000，最终触发 pytest.fail("PD分离服务启动失败")。日志中未出现主测试的 sampling_params.max_tokens 异常，且后续 EP4TP4 相关 case 能启动通过，因此更偏向 XPU Decode 服务启动/环境偶发问题。

关键日志:

服务健康检查中... 已等待 591 秒,P节点状态码:200,D节点状态码:000
PD分离服务启动超时:经过 10 分钟服务仍未启动!
tests/xpu_ci/8cards_cases/test_pd_21b_ep4tp1.py:285: Failed: PD分离服务启动失败

修复建议:

环境问题，请 rerun；若复现，优先检查 Decode 节点启动日志和 XPU/RDMA 资源状态。
关注日志中的 a1_coverage.pth startup hook TypeError 与 loaded_model_signal 缺失告警，确认是否影响 XPU worker 启动。

修复建议摘要: 环境问题，请 rerun

关联变更: 未发现与本 PR 变更文件的直接关联
链接: 查看日志

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-03 20:00:34

📋 Review 摘要

PR 概述：将 prompt_tokens_total、request_prompt_tokens、request_params_max_tokens 三项指标上报点从 engine_client.py（API 进程）迁移至 common_engine.py（主进程）。
变更范围：fastdeploy/engine/、fastdeploy/entrypoints/、tests/engine/
影响面 Tag：[Engine] [APIServer]

问题

级别	文件	概述
🟡 建议	`tests/engine/test_common_engine.py`	新增 mock 属性但缺少对应断言，测试不验证新指标上报行为

🟡 建议 tests/engine/test_common_engine.py — 两个测试用例（with/without trace_carrier）均新增了 prompt_token_ids_len 和 sampling_params mock 属性，但测试体只断言了 trace_set_proc_propagate_context，没有验证 prompt_tokens_total、request_prompt_tokens、request_params_max_tokens 是否被正确调用。测试仅确保不崩溃，不具备守护作用。

建议在 eng._insert_zmq_task_to_scheduler() 调用后补充断言（两个用例均需要）：

# 验证新增的指标上报
eng.metrics.prompt_tokens_total.inc.assert_called_once_with(2)
eng.metrics.request_prompt_tokens.observe.assert_called_once_with(2)
eng.metrics.request_params_max_tokens.observe.assert_called_once_with(16)

同时建议补充 sampling_params=None 的边界用例，验证无 sampling_params 时 request_params_max_tokens 不被调用。

历史 Findings 修复情况

Finding	问题	状态
F1	PR 标题使用了非官方 Tag `[Metrics]`	⚠️ 仍存在
F2	PR 描述各 Section 内容为空	✅ 已修复

📝 PR 规范检查

PR 标题使用了 [Metrics]，该 Tag 不在 FastDeploy 官方 Tag 列表中，应改用 [Engine]。

标题建议（可直接复制）：

[Engine] Move prompt_tokens_total metrics report to main process

PR 描述建议（点击展开，可直接复制）

## Motivation
将 `prompt_tokens_total`、`request_prompt_tokens`、`request_params_max_tokens` 三个指标的上报点从 `engine_client.py`（API 进程）迁移至 `common_engine.py` 主进程的 `_insert_zmq_task_to_scheduler`，使指标在请求真正进入调度器时才被记录，语义更准确，同时消除了从 API 进程向主进程指标对象写入的跨进程依赖。

## Modifications
- `fastdeploy/engine/common_engine.py`：在 `_insert_zmq_task_to_scheduler` 的请求入队处新增 `prompt_tokens_total`、`request_prompt_tokens`、`request_params_max_tokens` 三项指标上报
- `fastdeploy/entrypoints/engine_client.py`：删除上述三项指标上报及对应的 `main_process_metrics` import

## Usage or Command
pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py

## Accuracy Tests
N/A。本 PR 仅迁移指标上报位置，不影响模型输出逻辑。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests. (已在 test_common_engine.py 中更新测试)
- [x] Provide accuracy results. (N/A，无模型输出变更)
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码改动简洁，指标迁移方向正确，逻辑无误。测试新增了必要的 mock 属性，但缺少对应断言，建议补充以提升测试有效性；PR 标题 Tag 建议更换为官方 Tag [Engine]。

[Metrics] move prompt_tokens_total report to main process

2276319

liyonghua0910 had a problem deploying to Metax_ci June 2, 2026 12:10 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

TBD1 previously approved these changes Jun 3, 2026

View reviewed changes

[test] fix ci

3244a3d

liyonghua0910 dismissed TBD1’s stale review via 3244a3d June 3, 2026 11:44

liyonghua0910 had a problem deploying to Metax_ci June 3, 2026 11:45 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics] move prompt_tokens_total report to main process#7982

[Metrics] move prompt_tokens_total report to main process#7982
liyonghua0910 wants to merge 2 commits into
PaddlePaddle:developfrom
liyonghua0910:develop+20260602_prompt_tokens_total

liyonghua0910 commented Jun 2, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 2, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented Jun 3, 2026

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

xpu_8cards_case_test / run_xpu_8cards_cases

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

liyonghua0910 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 3, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 28/31 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

xpu_8cards_case_test / run_xpu_8cards_cases

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liyonghua0910 commented Jun 2, 2026 •

edited

Loading

codecov-commenter commented Jun 2, 2026 •

edited

Loading